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Abstract 

Only in recent years, the draft sequences for several agricultural animals have been assembled. Assembling an 
individual animal's entire genome sequence or specific region(s) of interest is increasingly important for agricultural 
researchers to perform genetic comparisons between animals with different performance. We review the current 
status for several sequenced agricultural species and suggest that next generation sequencing (NGS) technology 
with decreased sequencing cost and increased speed of sequencing can benefit agricultural researchers. By taking 
advantage of advanced NGS technologies, genes and chromosomal regions that are more labile to the influence 
of environmental factors could be pinpointed. A more long term goal would be addressing the question of how 
animals respond at the molecular and cellular levels to different environmental models (e.g. nutrition). Upon 
revealing important genes and gene-environment interactions, the rate of genetic improvement can also be 
accelerated. It is clear that NGS technologies will be able to assist animal scientists to efficiently raise animals and 
to better prevent infectious diseases so that overall costs of animal production can be decreased. 
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1. Current Status of Domestic Animal Reference 
Sequences 

As the new genomics era matures, with large-scale gen- 
ome research and the development of sophisticated 
bioinformatics tools that can be applied to the agricul- 
tural field, agricultural researchers should take advan- 
tage of and benefit from new sequencing and mapping 
technologies. In recent years, the genomes of several 
domesticated livestock animals (chicken, pig, cow, 
sheep, and horse) have been partially or completely 
sequenced. In this review, we first examine the current 
sequencing status for several sequenced agricultural spe- 
cies. Next, we discuss the different platforms used for 
genome sequencing, tools available for mapping 
sequences to the genome, and several additional applica- 
tions for which next generation sequencing can be used. 
We also list tools available for analyzing data from these 
additional applications. 
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Due to the high recombination rate of its micro-chro- 
mosomes, the chicken is an ideal model for studying 
genetic linkage [1], The chicken genome sequence of 
Red Junglefowl (RJF) was the first livestock species to be 
sequenced. The first draft of the chicken genome was 
built from an assembly with 6.6-fold whole-genome 
shotgun coverage, although sex chromosomes were 
poorly annotated in the initial assembly [1,2]. The 
updated version of NCBI build 2.1 was released recently 
with a significant improvement on the annotation of the 
sex chromosomes. Roughly 2.8 million SNPs for chicken 
were identified [1,3,4] between the base (wild type) RJF 
sequence assembly and a partial genome scan of three 
chicken breeds: a female layer (White Leghorn); a male 
broiler (Cornish); and a female Silkie. A moderate den- 
sity (60 k) Illumina SNP BeadChip for commercial 
chicken (broilers and layers) containing 352,303 SNPs 
was designed and additional SNPs not covered by the 
current chicken genome assembly (Gallus_gallus-2.1) 
were identified and selected recently [5]. The BBSRC 
ChickenEST Database (http://www.chick.manchester.ac. 
uk/) provides the most comprehensive database [6,7] of 
ESTs/cDNAs for the chicken genome. Chicken Varia- 
tion Database (ChickVD) (http://chicken.genomics.org. 
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cn/) was released in 2005 [4] for geneticists to use, and 
contains the genes, variants, chicken orthologs of 
human disease genes, and QTLs which are stretches of 
DNA containing or linked to the genes that underlie a 
quantitative trait. Large scale breeding research projects 
are still needed (http://www.nih.gov/science/models/gal- 
lus/). 

In November 2009 the first draft (98% complete) of 
the pig genome (Sus scrofa) assembled from global col- 
laborative efforts was released. The diploid pig genome 
has 38 chromosomes (including meta- and acrocentric 
ones) and is roughly 2.7 x 10 bp. Both high-throughput 
fingerprinting and BAC (bacterial artificial chromosome) 
end sequencing over 600,000 BAC end sequences) were 
used as templates for sequencing the whole swine gen- 
ome. Specifically, the restriction enzyme fingerprinting 
method [8] was used to construct a physical map 
through bacteria-based clones for the swine genome. 
The sequence will be used as the basis to identify genes 
that are important to pork production and/or are 
involved in immune or physiological processes (http:// 
www.sanger.ac.uk/about/press/2009/091102.html). The 
finished pig assembly will not only help researchers to 
understand its genetic complexity, but it will also 
change pork production and breeding technology. The 
completed swine genome is critical to helping research- 
ers study human nutrition and disease, due to these ani- 
mals' similar physiology and nutritional needs to 
humans (http://www.sanger.ac.uk/). 

The genome sequence of Taurine cattle was initially 
sequenced and assembled with approximately 7-fold cov- 
erage and was published by the Bovine Genome Sequen- 
cing and Analysis Consortium in April 2009. This initial 
assembly reported roughly 22,000 genes and 14,345 
orthologs shared among seven mammalian species [9]. 
Bovine Genome Sequencing Projects led by the Baylor 
College of Medicine Human Genome Sequencing Center 
in Houston, Texas released an improved assembly ver- 
sion (Btau_4.2) for the cow genome in 2009. The BCM4 
assembly was constructed using the Atlas assembly pro- 
gram [10]. The assembly of UMD2 from Steven Salzberg 
and his colleagues in Baltimore, Maryland was con- 
structed using NCBI traces and strengthened using sev- 
eral modified, powerful assembly and mapping tools. 
Roughly 24 million reads from whole genome sequencing 
and 11 million reads from BACs were used to create the 
UMD2 assembly [11]. The Salzberg lab recently created 
an updated assembly (UMD3.1) of 2.86 billion base pairs 
with 9.5x coverage of the genome [11]. Even with all of 
these efforts that researchers have invested, the cow gen- 
ome is still not completely assembled. The Illumina Bovi- 
neSNP50 is a high-density, genome-wide genotyping 
array. The v2 Bead Chip contains 54,609 SNPs of major 
breed types. The probes were validated in 19 common 



beef and dairy breeds. This makes certain types of 
research, such as QTL discovery and genetic improve- 
ment possible (http://www.illumina.com/products/bovi- 
ne_snp50_whole-genome_genotyping_kits.ilmn). 
Although BovineSNP50 was successfully used, several 
new chips have been designed and/or are being designed. 
Besides keeping BovineSNP50 SNPs, Bovine High-Den- 
sity (HD) Bead Chip (778K SNP) includes some Y-speci- 
fic and mitochondrial SNPs. Other chips, such as Bovine 
Low-Density (3K) Bead Chip, 96 SNP parentage chip, 
384 SNP chip, and 700 K SNP Affymetrix chip were 
designed to use for different genetic purposes (http:// 
www.slideserve.com/Download/143258/Walking-the- 
Cattle-Continuum-Moving-From-the-BovineSNP50-to- 
Higher-and-Lower-Density-SNP-Panels). A new colla- 
borative project between Australian beef and dairy indus- 
tries and international partners is constructing a database 
of functional polymorphisms and sequence information 
on 1,000 cattle. This will facilitate research on identifying 
features in the genome that are related to economically 
important traits (http://www.beefcrc.com.au/Assets/819/ 
l/BeefBulletin-September201 17-9- llwebspreads.pdf). 
Given the importance of the Bovine sequence in impact- 
ing the dairy industry's genetic gains, future technology 
and novel assembly methods are desired to bring the cow 
genome annotation to a more complete state and to pro- 
vide a faster, cost-efficient way of sequencing other cattle 
breeds. Such sequencing projects could help understand 
variation in resistance to disease and lead to improved 
breeding programs. 

The interim assembly version OARv2.0 for sheep was 
released recently [12] with the goal of identifying genes 
associated with production, quality, and disease traits in 
sheep (http://www.sheephapmap.org/). The OARv3.0 is 
projected to be released in late 2011 with the expected 
improvement that chromosomal gaps will be filled and 
many of the unassigned sequences in v2.0 will be correctly 
assigned to chromosomes. In addition, transcriptomic and 
SNP datasets are expected in the new release as well 
(http:/ / sheephapmap.org/ news/ ScheduledO ARv3.php) . 

The horse is a model organism for studying biome- 
chanics and exercise physiology (http://www.ncbi.nlm. 
nih.gov/projects/genome/guide/horse/). The sequence of 
the horse is also important to help veterinarians study 
new therapies for horse laminitis and respiratory dis- 
eases. In recent years, there has been progress in the 
identification of mutations in genes related to morphol- 
ogy, immunology, and metabolism in the horse [13]. 

The detailed sequencing description for the above 
mentioned domestic animals is listed in Table 1. 

2. Next Generation Sequencing Technologies 

Next generation sequencing technologies (NGS), using 
modern methods/platforms to produce significant 



Bai et al. Journal of Animal Science and Biotechnology 2012, 3:8 
http://www.jasbsci.eom/content/3/1/8 



Page 3 of 6 



Table 1 Various sequenced livestock genomes 



Animal Species Genome size 



Sequencing methods 



Recent 
release 
version 



Sequencing center 



Chicken 



Pig 



Cow 



Sheep 



Horse 



Gallus 
gallus 

Sus 
scrofa 



Bos 
taurus/ 



indicus 



Ovis 
aries 

Equus 
caballus 



1.2 Gb (39 
chromosome pairs) 

2.7 Gb (18 
autosomes, X and 
Y sex 
chromosomes) 

2.86 billion base 
pair 



2.71 Gb (91% of 
sheep genome) 

2.4-2.7 Gb 



Bacteria Artificial Chromosomes 
(BAC), fosmid, and piasmid-based 
whole genome shotgun (WGS) 

Clone based 



Mixture of hierarchical and whole- 
genome shotgun 



7.1 5x mixed assembly of whole- 
genome shotgun and BAC sequence 

WGS 



6.79x WGS 



NCBI 
build 2.1 

NCBI 
build 3.1 



UMD_3.* 



Btau_4.2 



OARv2.0 
(working 
draft) 

EquCab2.0 



Washington University Genome Sequencing Center 



The Swine Genome Sequencing Consortium 



The original sequencing was conducted at the Baylor 
College of Medicine in Houston, Texas, but the genome 
was reassembled by Salzberg lab in Baltimore, Maryland 

Bovine Genome Sequencing Project led by the Baylor 
College of Medicine's Human Genome Sequencing 
Center in Houston, Texas 

International Sheep Genomics Consortium 



The Broad Institute and the Horse Genome Project 



numbers of sequence fragments, have revolutionized 
research in genetic and biomedical fields and have 
become increasingly popular in recent years. Several 
massively parallel platforms are in widespread use by 
sequencing centers or laboratories at present. These 
include the Illumina (former Solexa) Genome Analyzer, 
HiSeq (http://www.illumina.com), Roche/454 FLX 
(http://www.454.com), and the Applied Biosystems 
SOLiD™ System (http://www.appliedbiosystems.com). 
These platforms can generate millions to billions of 
reads in a single run with the read length in the range 
of 50 to 500 bp. The difference between these technolo- 
gies is embodied in many parameters such as clonal 
amplification method, instrument used, sequencing 
enzyme/method used, and read length generated. Since 
the number of reads produced and sequencing speed 
differ among technologies, the generation rate is also 
different among these technologies. Current Illumina 
HiSeq technology can generate 150 to 200 Gb data for 
paired-end 100 bp read length in 8 days. The base call 
accuracy also varies between these platforms (http:// 
kevin-gattaca.blogspot.com/2010/04/comparing-ngs-plat- 
forms-454-solexa.html) . 

Several cutting-edge biological applications such as 
targeted exome capture or exome sequencing, Chroma- 
tin Immunoprecipitation sequencing (ChlP-Seq), and 
whole transcriptome shotgun sequencing technology or 
RNA-Seq have been developed to fulfill different biolo- 
gical purposes. Exome-sequencing [14] overcomes the 
drawback of the high cost of sequencing the whole gen- 
ome by excluding intronic regions and selectively 
sequencing the exonic regions that might be of more 
immediate interest. ChlP-Seq [15] is used to identify 



genome-wide binding patterns of a protein of interest 
such as a transcription factor and is a powerful 
approach to study protein-DNA/RNA interactions. 
RNA-Seq [16,17] or transcriptome-wide sequencing is 
used to exploit NGS technologies to sequence cDNAs 
from RNA samples. 

To reveal variations among different strains or large 
populations of related samples, one of the above NGS 
techniques can be employed because of its advantages, 
such as a high efficiency to cost ratio (according to the 
National Human Genome Research Institute (NHGRI) 
(http://genome.gov/sequencingcosts)). The cost per 
megabase of DNA sequencing was under 50 cents and 
cost per genome was estimated at $11,000 in March 
2011. Sequence mutation and structure variations are 
commonly searched in the targeted sequencing (exome 
or whole genes). Popular SNP detection tools are 
SNVMix [18], SAMtools [19], and GATK application 
package [20,21]. Structure variation (copy number varia- 
tion) detection tools/methods, such as CNV-seq [22], 
SLOPE [23], SVDetect [24], and associated statistical 
methods have been developed in recent years to identify 
INDELs, tandem duplications, and other genetic 
variations. 

RNA-Seq technology is being used as a popular 
method for quantitative gene expression studies [25]. 
However, accurate gene expression estimation requires 
accurate genome annotation [26]. By utilizing complete 
or nearly completely annotated reference genomes, 
RNA-Seq can assist researchers to identify differentially 
expressed genes and novel transcripts for agricultural 
animals in a quantitative and efficient way. The power 
of RNA-Seq is not only in helping agricultural 
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researchers to select differentially expressed genes 
between samples under different treatment condition(s) 
that could be crucial for certain traits or disease resis- 
tance, but it can also reveal multiple isoforms that tem- 
plate assembly does not possess in its annotation. There 
are several popular differential expression testing tools 
for RNA-Seq data, such as edgeR [27] and DEGSeq [28]. 
Powerful splice junction sites identification tools are 
represented by Cufflinks [29]/TopHat [30] and Super- 
splat [31]. RNA-Seq technology can also assist research- 
ers in annotating transcription of the genome in a 
complete manner at different developmental stages [26]. 

A collection of current popular NGS tools/algorithms 
and their description in fulfilling the goals for different 
biological applications is listed in Table 2. 

3. Challenges and Perspectives for Livestock 
Sequencing Research 

From raw draft assembly to full length cDNA/EST 
resources and BAC libraries, livestock species have 
undergone significant annotations in recent years. The 
consequence of sequencing agricultural animals has 
expanded far beyond the original goals of serving as a 
model for studying human health issues and physiologi- 
cal phenomena, to increasing our understanding of the 
human genome, and to studying traits of economic and 
biological interest to raising livestock production. We 
are now at the beginning of an era where genome 



sequencing analysis of livestock will allow study of 
domestication, selection of better breeds (e.g. high ferti- 
lity) and understanding of quantitative differences due 
to environmental factors (e.g. nutrition). Gene-gene and 
gene- environment interactions related to environmental 
conditions could be studied quantitatively using modern 
bioinformatics tools. It can clearly be seen that sequen- 
cing individual animal genomes or interesting regions 
under different treatment conditions will benefit the 
agricultural community by providing guidance for 
experimental design and animal disease control and pre- 
vention. Livestock animals serve as a major meat/egg/ 
dairy (protein) source for human beings. The need to 
reduce the use of chemicals/antibiotics and improve 
genetic resistance to pathogens is becoming increasingly 
important to human beings and agricultural scientists 
[1]. These new goals are too time consuming and/or 
costly to be achieved using traditional genetic 
approaches. NGS technologies will enable a break- 
through in genetics studies by shortening the sequen- 
cing time and decreasing the cost. NGS technologies 
will reveal more genetic diversity for many commercial 
breeds with short turnaround time. For example, NGS 
can help to sequence mutant lines in a much more effi- 
cient manner. By identifying genes/proteins with desir- 
able traits (disease resistance and/or high milk/egg/meat 
production), researchers could better control selection, 
and this will in turn improve both productivity and 



Table 2 Selected variant calling, RNA-Seq, and ChlP-Seq software/tools and database links 



Name 



Description 



Features/Restrictions 



Link 



SNVMix Detects single nucleotide variants from next 

generation sequencing data 

SAMTools Manipulating alignments in the SAM format 

(sorting, merging, indexing and ...) 

GATK Contains modules of depth of coverage analyzers, 

quality score recalibrator, SNP/lndel caller, and local 
realigner 

ERANGE ERANGE is a python package and uses the 

(RNA-Seq) Cistematic package 

lllumina Counts can be visualized and analyzed in lllumina's 
(RNA-Seq) GenomeStudio viewer 

TopHat Fast splice junction mapper 

Cufflinks Assembling transcripts and estimating their 

abundances from RNA-Seq data 

ERANGE Studying protein-DNA interactions 

(Chip-Seq) 

HPeak The software can accurately pinpoint regions to 

which significantly more sequence reads are 
mapped 

MACS Uses a dynamic Poisson distribution to effectively 

capture local biases in the genome sequence and 
allows for more sensitive and robust prediction 

CISGenome An integrated tool for tiling arrays, ChlP-seq, 
genome and cis-regulatory element analysis 



Input files are Maq or Samtools pileup format 



The software is free and is designed for multiple 
uses. 

The software is Java based and requires input files 

as sorted, indexed BAM alignment files and a 
fasta-format reference with associated index files 

The software is free and gives the flexible input 
parameter choice 

License required, more robust (requires lllumina's 
output directory contents) 

Input files can be either FASTQ or FASTA format 

Input alignment files are in the SAM format and 
the software requires reference annotation GTF 
file 

Free 

Hidden Markov model-based approach 



The software is publicly available open-source, 
and used for ChlP-Seq analysis with or without 
control samples. 

N.A. 



http://www.bcgsc.ca/ 
platform/bioinfo/software/ 
SNVMix 

http://samtools.sourceforge. 
net/ 

http://www.broadinstitute. 
org/gsa/wiki/index.php/ 
The_Genome_Analysis_Toolkit 

http://woldlab.caltech.edu/ 
rnaseq/ 

http://www.illumina.com/ 

http://tophat.cbcb.umd.edu/ 
http://cufflinks.cbcb.umd.edu/ 



http://woldlab.caltech.edu/ 
erange/README.chip-seq 

http://www.sph.umich.edu/ 
csg/qin/HPeak/ 

http://liulab.dfci.harvard.edu/ 
MACS/ 

http://www.biostat.jhsph.edu/ 
~hji/cisgenome/ 
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animal welfare. Sequencing individual agricultural ani- 
mals will increase opportunities for resisting animal 
pathogens that can challenge meat/egg/dairy production. 
Since domestic animals are the leading source of animal 
protein for human beings, the sequencing research will 
provide valuable information for efficient production of 
a leaner, healthier and more economical source of ani- 
mal protein for human consumption. 

The breeding of farm animals is entering the post- 
genome era [32]. Despite some deficiencies of NGS, e.g. 
poor coverage of GC rich areas and the challenges in 
the assembly when a good reference genome is not 
available, NGS technologies (RNA-Seq, Chip-Seq, and 
Genome-resequencing) are still able to help animal 
scientists study individual genomes at a pace far quicker 
than previously could be achieved. We believe that 
sequencing individual animals treated with different 
conditions shows great promise. Sequencing micro- 
organisms and parasites in agricultural animals' organs 
can also help veterinarians develop new vaccines and 
therapeutics [32]. NGS will also facilitate the study of 
gene expression and regulatory mechanisms of milk pro- 
duction and egg/meat flavor in animals. By utilizing 
NGS approaches/tools, researchers can identify and 
further analyze individual genes controlling/affecting 
economic traits in agricultural animals, which will even- 
tually benefit the consumers. 
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